Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Fixed word-aligned partition compression algorithm of inverted list based on directed acyclic graph
JIANG Kun, LIU Zheng, ZHU Lei, LI Xiaoxing
Journal of Computer Applications    2021, 41 (3): 727-732.   DOI: 10.11772/j.issn.1001-9081.2020060874
Abstract467)      PDF (905KB)(427)       Save
In Fixed Word-Aligned (FWA) inverted index compression algorithms of Web search engines, due to the "greedy" block partition strategy of the inverted list and the interleaved storage of the codeword information, it is difficult for the algorithm to achieve the optimal compression performance. To solve the above problem, an FWA partition compression algorithm based on Directed Acyclic Graph (DAG) was proposed. Firstly, considering the small integer information in the inverted list brought by the clustering characteristics of Web pages, a novel FWA compression format with data area of 64-bit blocks was designed. In this compression format, the data area was divided into 16 storage modes suitable for continuous small integer compression through 4-bit selector area, and the selector area and data area in each block of the inverted list were stored separately, so as to ensure good batch decompression performance. Secondly, a DAG described Word-Aligned Partition (WAP) algorithm was proposed on the basis of the new compression format. In the algorithm, the inverted list block partitioning problem was regarded as a Single-Source Shortest-Path (SSSP) problem by DAG, and by considering the constraints of various storage modes of data area in FWA compression format, the structure and recursive definition of the SSSP problem were determined. Thirdly, the dynamic programming technique was used to solve the problem of SSSP and generate the pseudo-code and algorithm complexity of the optimal partition vector. The original storage modes of traditional FWA algorithms such as S9, S16 and S8b were partitioned and optimized based on DAG, and the computational complexities of the algorithms before and after optimization were compared and analyzed. Finally, the compression performance experiments were conducted with simulation integer sequence data and Text REtrieval Conference (TREC) GOV2 Web page index data. Experimental results show that, compared with the traditional FWA algorithms, the DAG based FWA partition algorithm can improve the compression ratio and decompression speed by batch decompression and partition optimization technology. At the same time, it can obtain a higher compression ratio than the traditional Frame Of Reference (FOR) algorithms for the compression of continuous small integer sequence.
Reference | Related Articles | Metrics